27 research outputs found
A Dataset and Analysis of Open-Source Machine Learning Products
Machine learning (ML) components are increasingly incorporated into software
products, yet developers face challenges in transitioning from ML prototypes to
products. Academic researchers struggle to propose solutions to these
challenges and evaluate interventions because they often do not have access to
close-sourced ML products from industry. In this study, we define and identify
open-source ML products, curating a dataset of 262 repositories from GitHub, to
facilitate further research and education. As a start, we explore six broad
research questions related to different development activities and report 21
findings from a sample of 30 ML products from the dataset. Our findings reveal
a variety of development practices and architectural decisions surrounding
different types and uses of ML models that offer ample opportunities for future
research innovations. We also find very little evidence of industry best
practices such as model testing and pipeline automation within the open-source
ML products, which leaves room for further investigation to understand its
potential impact on the development and eventual end-user experience for the
products
An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset
The fork-based development mechanism provides the flexibility and the unified
processes for software teams to collaborate easily in a distributed setting
without too much coordination overhead.Currently, multiple social coding
platforms support fork-based development, such as GitHub, GitLab, and
Bitbucket. Although these different platforms virtually share the same
features, they have different emphasis. As GitHub is the most popular platform
and the corresponding data is publicly available, most of the current studies
are focusing on GitHub hosted projects. However, we observed anecdote evidences
that people are confused about choosing among these platforms, and some
projects are migrating from one platform to another, and the reasons behind
these activities remain unknown.With the advances of Software Heritage Graph
Dataset (SWHGD),we have the opportunity to investigate the forking activities
across platforms. In this paper, we conduct an exploratory study on 10popular
open-source projects to identify cross-platform forks and investigate the
motivation behind. Preliminary result shows that cross-platform forks do exist.
For the 10 subject systems in this study, we found 81,357 forks in total among
which 179 forks are on GitLab. Based on our qualitative analysis, we found that
most of the cross-platform forks that we identified are mirrors of the
repositories on another platform, but we still find cases that were created due
to preference of using certain functionalities (e.g. Continuous Integration
(CI)) supported by different platforms. This study lays the foundation of
future research directions, such as understanding the differences between
platforms and supporting cross-platform collaboration.Comment: Accepted at 17th International Conference on Mining Software
Repositories, October 5--6, 2020, Seoul, Republic of Kore
Recommended from our members
Blood pressure and expression of microRNAs in whole blood
Background: Blood pressure (BP) is a complex, multifactorial clinical outcome driven by genetic susceptibility, behavioral choices, and environmental factors. Many molecular mechanisms have been proposed for the pathophysiology of high BP even as its prevalence continues to grow worldwide, increasing morbidity and marking it as a major public health concern. To address this, we evaluated miRNA profiling in blood leukocytes as potential biomarkers of BP and BP-related risk factors. Methods: The Beijing Truck Driver Air Pollution Study included 60 truck drivers and 60 office workers examined in 2008. On two days separated by 1–2 weeks, we examined three BP measures: systolic, diastolic, and mean arterial pressure measured at both pre- and post-work exams for blood NanoString nCounter miRNA profiles. We used covariate-adjusted linear mixed-effect models to examine associations between BP and increased miRNA expression in both pooled and risk factor-stratified analyses. Results: Overall 43 miRNAs were associated with pre-work BP (FDR<0.05). In stratified analyses different but overlapping groups of miRNAs were associated with pre-work BP in truck drivers, high-BMI participants, and usual alcohol drinkers (FDR<0.05). Only four miRNAs were associated with post-work BP (FDR<0.05), in ever smokers. Conclusion: Our results suggest that many miRNAs were significantly associated with BP in subgroups exposed to known hypertension risk factors. These findings shed light on the underlying molecular mechanisms of BP, and may assist with the development of a miRNA panel for early detection of hypertension
The causal relationship between COVID-19 and seventeen common digestive diseases: a two-sample, multivariable Mendelian randomization study
Abstract Objectives In clinical practice, digestive symptoms such as nausea, vomiting are frequently observed in COVID-19 patients. However, the causal relationship between COVID-19 and digestive diseases remains unclear. Methods We extracted single nucleotide polymorphisms associated with the severity of COVID-19 from summary data of genome-wide association studies. Summary statistics of common digestive diseases were primarily obtained from the UK Biobank study and the FinnGen study. Two-sample Mendelian randomization analyses were then conducted using the inverse variance-weighted (IVW), Mendelian randomization-Egger regression (MR Egger), weighted median estimation, weighted mode, and simple mode methods. IVW served as the primary analysis method, and Multivariable Mendelian randomization analysis was employed to explore the mediating effect of body mass index (BMI) and type 2 diabetes. Results MR analysis showed that a causal association between SARS-CoV-2 infection (OR = 1.09, 95% CI 1.01–1.18, P = 0.03), severe COVID-19 (OR = 1.02, 95% CI 1.00–1.04, P = 0.02), and COVID-19 hospitalization (OR = 1.04, 95% CI 1.01–1.06, P = 0.01) with gastroesophageal reflux disease (GERD). Mediation analysis indicated that body mass index (BMI) served as the primary mediating variable in the causal relationship between SARS-CoV-2 infection and GERD, with BMI mediating 36% (95% CI 20–53%) of the effect. Conclusions We found a causal relationship between SARS-CoV-2 infection and gastroesophageal reflux disease. Furthermore, we found that the causal relationship between SARS-CoV-2 infection and GERD is mainly mediated by BMI
LncRNA BCAR4, targeting to miR-665/STAT3 signaling, maintains cancer stem cells stemness and promotes tumorigenicity in colorectal cancer
Abstract Background Breast cancer anti-estrogen resistance 4 (BCAR4) is closely associated with colorectal cancer (CRC) initiation and propagation. However, the mechanisms underlying BCAR4 function in colon cancer remains largely unknown. In this study, we hypothesized that BCAR4 could regulate colon cancer stem/initiating cells (CSC) function and further facilitates the colon cancer progression. Methods qRT-PCR was used to examine the expression of BCAR4 and various CSC markers. FACS, acetaldehyde dehydrogenase (ALDH) activity and western blot assays were applicable to test the expression of CSC markers. CCK8, tumorsphere formation and transwell assays were adopted to examine the capacity of CRC cells proliferation, self-renewal and migration. Pull down assay was used to test the interaction between BCAR4 and miR-665. Luciferase reporter assay was used to examine the interaction of miR-665 and activators of transcription (STAT3). In vivo tumor xenograft study was used to verify the malignancy of CRC cells with inhibition of BCAR4. Results Breast cancer anti-estrogen resistance 4 was highly expressed in both CRC cells and stem/initiating cells. In addition, overexpression of BCAR4 facilitated the maintenance of ALDH positive cells (a type of cancer stem/initiating cells) stemness and promoted ALDH+ cells proliferation and migration. Inhibition of BCAR4 restricted ALDH+ cells proliferation and migration. We further proved that miR-665 was the target of BCAR4 and subsequently activated signal transducers and STAT3 signaling which is an important pathway in cancer stem cells self-renewal. Conclusions Breast cancer anti-estrogen resistance 4 promotes the CRC cells stemness through targeting to miR-665/STAT3 signaling and identification of the BCAR4 in CRC stem cells provides a new insight into CRC diagnosis, treatment, prognosis and next-step translational investigations
Pancancer analysis uncovers an immunological role and prognostic value of the m6A reader IGF2BP2 in pancreatic cancer
Introduction: Pancreatic ductal adenocarcinoma (PDAC) is one of the most malignant gastrointestinal tumors worldwide with a dismal prognosis and high relapse rate. PDAC is considered a “cold cancer” for which immunotherapy is not effective. Therefore, to improve the prognosis for PDAC patients, it is urgent to explore the mechanism driving its insensitivity to immunotherapy. Materials and methods: We conducted pancancer analyses to test IGF2BP family expression and survival in patients with different cancers via TCGA and GETx databases. Then, we determined the immunological role and prognostic value of IGF2BP2 in vitro, in vivo and in clinical specimens. Results: In the present study, we found that the m6A reader IGF2BP2 was the most clinically relevant member of the IGF2BP family for pancreatic cancer. High expression of IGF2BP2 was most associated with poor prognosis and an immunosuppressive microenvironment in PDAC. By IGF2BP2 knockdown, we found that tumor cell proliferation and invasive ability were significantly diminished. Importantly, we found that IGF2BP2 expression was closely associated with high expression of immunosuppressive molecules such as PD-L1. IGF2BP2 modulated downstream PD-L1 expression by regulating its mRNA stability via m6A methylation control, and we obtained the same verification in animal experiments and human tissue specimens. Conclusion: Our study contributes to existing knowledge regarding the IGF2BP2-regulated PD-L1 signaling pathway as a potential prognostic and immune biomarker in pancreatic cancer
Extracting Configuration Knowledge from Build Files with Symbolic Analysis
<p>Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. Extracting such configuration knowledge is important for many tools analyzing highly-configurable systems, but very challenging due to the complex nature of build systems. We design an approach, based on SYMake, that symbolically evaluates Make files and extracts configuration knowledge in terms of file presence conditions and conditional parameters. We implement an initial prototype and demonstrate feasibility on small examples.</p